Teaching AI to Regret: The Backspace Token Theory

Humans backtrack. We type "thr" and realize we meant "the" and we fix it. We type "tje" and we laugh at our own fingers and we correct it. Large language models do not do this. They commit to every token like it is a binding legal contract.

I started wondering what would happen if we gave them an out. What if we added a backspace token to the vocabulary? A special signal that says "undo the last thing." The training data would look like raw keystroke logs instead of polished text. "The cat jumped over thr[DELETE] tje [DELETE] the dog."

The Confidence Problem

Current models predict the next token based on everything before it. They do not look back. Once "thr" is generated, the model wants to finish "three" or "through". It does not say "oops". It doubles down. My tiny model does this constantly. It writes nonsense and then builds entire paragraphs justifying that nonsense.

Adding a delete token changes the game. Suddenly the model can express uncertainty. It can show its work. It can mimic the human process of thinking out loud and then correcting course. This feels more honest. This feels more like intelligence.

Intelligence might not be about getting it right the first time. Intelligence might be about noticing you were wrong and fixing it before anyone else sees.

My Tiny Experiment

I tried this. I trained a small model on keystroke data with backspace tokens included. I expected magic. I got anxiety.

The model learned to delete everything. It would write one word and then immediately delete it. It would write a sentence and then backspace over the whole thing. It developed a fear of commitment. I asked it a simple math question and it typed "The answer is 4[DELETE] 5[DELETE] 6[DELETE]" and then stopped generating. It was too busy correcting itself to ever finish.

I had to adjust the training. I penalized excessive deleting. I rewarded completion. The model learned to balance. It still deletes more than a human would. It still hesitates. But sometimes, when it is about to hallucinate a fish fact during a calculus problem, it pauses. It deletes the word "trout". It writes "integral" instead. Progress.

The Philosophical Angle

Current AI hides mistakes. Human intelligence shows the work. We see the crossed-out words in the notebook. We see the draft with changes tracked. That process contains information. It shows where the thinking was hard. It shows where the uncertainty lived.

Maybe we do not want perfect output. Maybe we want honest process. A model that deletes its errors is admitting fallibility. That is dangerous for a company selling certainty. That is wonderful for a person trying to understand how the answer was reached.

Back to Fish

I am going to go check on my original model. The one without backspace tokens. It is probably writing something confidently wrong about aquatic life. At least it finishes its sentences. At least it does not delete its own existence mid-thought.

There is comfort in simplicity. There is also comfort in knowing that even the smartest systems sometimes need to hit control-z. I just wish mine did not do it quite so dramatically.

The Confidence Problem

My Tiny Experiment

The Philosophical Angle

Further Reading - For The Keystroke Obsessed

Back to Fish